Useful statistics for corpus linguistics
نویسنده
چکیده
• frequencies of occurrence of linguistic elements, which can be studied from two different perspectives: o how frequent are morphemes or words or patterns/constructions in (parts of) a corpus? This information can be provided in various different forms of frequency lists; o how evenly are morphemes or words or patterns/constructions distributed across (parts of) a corpus? This information can be provided in the form of various dispersion statistics; • frequencies of co-occurrence: how often do linguistic elements such as morphemes, words, patterns/construction co-occur with another linguistic element from this set or a position in a text.
منابع مشابه
Do We Need Discipline-Specific Academic Word Lists? Linguistics Academic Word List (LAWL)
This corpus-based study aimed at exploring the most frequently-used academic words in linguistics and compare the wordlist with the distribution of high frequency words in Coxhead’s Academic Word List (AWL) and West’s General Service List (GSL) to examine their coverage within the linguistics corpus. To this end, a corpus of 700 linguistics research articles (LRAC), consisting of approximately ...
متن کاملAutomatic Processing of Large Corpora for the Resolution of Anaphora References
Manual acquisition of semantic constraints in broad domains is very expensive. This paper presents an automatic scheme for collecting statistics on cooccurrence patterns in a large corpus. To a large extent, these statistics reflect, semantic constraints and thus are used to disambiguate anaphora references and syntactic ambiguities. The scherne was implemented by gathering statistics on the ou...
متن کاملOnline statistics labs
Recent publications in the field of corpus linguistics (including several in this and the previous issue of CLLT) strongly indicate that the field is on its way from a view of corpora as mere repositories of authentic data from which examples can be culled ad libitum to a methodology that analyzes linguistic phenomena systematically and exhaustively as they manifest themselves in corpus data. T...
متن کاملExtracting Syntax Statistics from Large Corpora of Written English
The field of linguistics has seen a growing interest in the statistics of everyday language. In studying how we acquire language and why some of its aspects are more difficult for us than others, it is critical to understand the linguistic environment to which we are exposed. However, gathering statistics over syntactic structures, even with a syntactically tagged corpus, can be difficult and t...
متن کامل